FASTCUDA: Open Source FPGA Accelerator & Hardware-Software Codesign Toolset for CUDA Kernels
نویسندگان
چکیده
Using FPGAs as hardware accelerators that communicate with a central CPU is becoming a common practice in the embedded design world but there is no standard methodology and toolset to facilitate this path yet. On the other hand, languages such as CUDA and OpenCL provide standard development environments for Graphical Processing Unit (GPU) programming. FASTCUDA is a platform that provides the necessary software toolset, hardware architecture, and design methodology to efficiently adapt the CUDA approach into a new FPGA design flow. With FASTCUDA, the CUDA kernels of a CUDA-based application are partitioned into two groups with minimal user intervention: those that are compiled and executed in parallel software, and those that are synthesized and implemented in hardware. A modern low power FPGA can provide the processing power (via numerous embedded micro-CPUs) and the logic capacity for both the software and hardware implementations of the CUDA kernels. This paper describes the system requirements and the architectural decisions behind the FASTCUDA approach.
منابع مشابه
Design Methodology for Offloading Software Executions to FPGA
Field programmable gate array (FPGA) is a flexible solution for offloading part of the computations from a processor. In particular, it can be used to accelerate an execution of a computationally heavy part of the software application, e.g., in DSP, where small kernels are repeated often. Since an application code for a processor is a software, a design methodology is needed to convert the code...
متن کاملSystem Level Verification and Performance Analysis for FPGA Accelerated Computers
System Level Verification and Performance Analysis for FPGA Accelerated Computers Zhimin Chen, Xu Guo, Ambuj Sinha, and Patrick Schaumont Department of Electrical and Computer Engineering Virginia Tech, Blacksburg, VA 24060, USA E-mail: {chenzm, xuguo, ambujs87, schaum}@vt.edu. As an accelerator, Field Programmable Gate Array (FPGA) has become a great potential to assist a general-purpose proce...
متن کاملA High Performance FPGA-Based Sorting Accelerator with a Data Compression Mechanism
Sorting is an extremely important computation kernel that has been accelerated in a lot of fields such as databases, image processing, and genome analysis. Given that advent of Internet of Things (IoT) era due to mobile technology progressions, the future needs a sorting method that is available on any environment, such as not only high performance systems like servers but also low computationa...
متن کاملSpace Codesign: A SystemC Framework for Fast Exploration of Hardware/Software Systems
Electronic System Level has brought new abstractions for designing systems, which most designers are not familiar with. The Space CodesignTM SystemC design framework allows designers to easily model hardware/software-based systems, starting from a high level model and refining down to the chip. We propose a rapid system prototyping toolset that permits co-monitoring of specifications, effortles...
متن کاملPerformance-Portable Many-Core Plasma Simulations: Porting PIConGPU to OpenPower and Beyond
With the appearance of the heterogeneous platform OpenPower, many-core accelerator devices have been coupled with Power host processors for the first time. Towards utilizing their full potential, it is worth investigating performance portable algorithms that allow to choose the best-fitting hardware for each domain-specific compute task. Suiting even the high level of parallelism on modern GPGP...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012